can ESpeak be Imbedded?

Category: Geeks r Us

Post 1 by Eleni21 (I have proven to myself and the world that I need mental help) on Wednesday, 31-Mar-2010 14:52:36

Whenever I go to sites with products for the blind, I always see tons of talking things. some are games, others are necesssary household appliances or tools for independence. But when I go to the few Greek blind sites that I could find, talking products virtually don't exist. The only things I've found so far are a bathroom scale, a blood presssure monitor and a watch. I'd like to change this by creating relatively cheap talking products for the Greek market. Of course, the first step is to decide which products to produce. this is probably best done through a combination of a survey and research. I'd also need to find people to help in this venture, since I'm not any kind of technician or programmer. But I really need to figure out what synthesizer to use or it'll all be worthless. I'd like to keep the cost down as much as possible, both for myself/the company as the manufacturer and for the consumer, so probably the professional synths are out. But I know that ESpeak is open source. Can it be imbedded in things that don't have an opperating system? If not, then would there be a way to put my voice or something on chips, particularly for items which say only a few things? Any help would be sincerely appreciated. this really means alot to me, since I'd be helping blind and visually-impaired Greeks all over the world. Btw, I'm not even considering pattenting any of this, since the cost to do so is astronomical. Also, I love the NVDA ESpeak voices, though I haven't heard the really new ones yet. Michel is my favourite. How do I get regular ESpeak to use that voice and also to use the nice inflection that exists within NVDA?

Post 2 by LeoGuardian (You mean there is something outside of this room with my computer in it?) on Wednesday, 31-Mar-2010 15:51:38

Eleni,

Like many of your ambitions, your technical problem is needing to get a base understanding of how some of this works.
First, any time you have a digital device you will have some form of OS, be it proprietary or otherwise, for the express purpose of handling i/o. You probably could embed eSpeak but you would first in your device need to determine how TTS is going to work. Is your device going to output a fixed series of messages? You could use audio files and an out (stdout or its complement but to a wave device).
Otherwise, you would first have to port eSpeak to the platform you are using, which may require compromises to match the footprint, and at the user level (UI) determine how messages are to be output, e.g are you going to output full screens of data, or is this a one-line display that changes as user acts ...
In other words there is a lot more than just porting a particular speech synthesizer (if you even need to) to a given platform. Of course, if you're just playing a few (hundred) messages, all of which you know ahead of time aka a talking thermometer or scales or something, you will probably be better served just using prerecorded files. Remember a TTS engine is ... an engine ... and that's what a synth is. Audio files are just data to be output via a particular type of output device aka sound.
In short, learn about embedded systems; you'll need an electrical engineer and embedded systems developer, just to name some. I've done my share of work on mobile devices, and I'm here to tell you the embedded platform is very very different from the desktop.
As to your other question about the variants / inflection, look it up on their site. It is open source,and thus available. Whether or not their SAPI implementation supports those depends upon their implementation, and perhaps Sapi itself. I haven't done any SAPI programming aside from some basic tomfoolery but Microsoft has plenty on it on MSDN.
All these resources are available.

Post 3 by Eleni21 (I have proven to myself and the world that I need mental help) on Wednesday, 31-Mar-2010 18:25:03

Beautiful! Now I've just saved myself loads of headaches by posting here. I'll simply use voice outputs via wave files or some such. I'm alot more familiar and comfortable with analogue tech but doubt it could really work in this setting. But the files shouldn't be too difficult. As for the other stuff, I have no clue how that would be done.

Post 4 by wildebrew (We promised the world we'd tame it, what were we hoping for?) on Wednesday, 31-Mar-2010 19:07:34

A few other things to consider:
Are you sure these products do not exist?
Are you in touch with anyone in Greece who oversees these things (the head of the Aegis project, which is probbly one of the best funded accessibility oriented open source program today, is a Greek professor, so they have lots of super qualified people(,
is it worth it to do such work, don't most people speak English well enough to use things like scales without having to translate the speech output?
Do you really speek Greek well enough to really recognize a good Greek voice and accent (not saying you don't simply wondering if you do).
Before embarking on some major project you think is good for someone, you have to check if the people who are doing it for think so too, a major flaw of way too many charity programmes in recent history.
One simple thing that could be done is to support ESpeak improvements for the language by providing feedback, I am currently doing this for Icelandic and working directly with the developer, I am even in the process of securing some funding for him to improve the support.
If you compare this to a cost of developing a TTS engine of high quality, especially if you go into concatenative engines, a few thousand dollars can go a very long way in Open Source, cost of your typical TTS engine from the ground up is between 250 and 600000 dollars, depending on provider and quality.
Also, even if you use a wav file you are not using analog technology at all, it's still digital, so that arguement does not apply.

Post 5 by Eleni21 (I have proven to myself and the world that I need mental help) on Wednesday, 31-Mar-2010 22:32:34

I would like to carry out a survey among blind Greeks to see what they want in tethnology. If I find that, as you said, they don't need or want these products, then of course, continuing would be a waste of time. I also intend on contacting various blind-related organisations, not simply for this, but for my own future. It would be a very good idea for me to connect with blind people as well as those who can support me with tips and tricks about independence there such as mobility (very important if I'm going to Athens, which I hope not to be, since it's extremely dangerous), differences, if any, between here and Greece that would effect my life as a blind person etc. I've never heard of the Aegis Project but am highly impressed that the head of it is Greek. Perhaps, we can talk about why there's such a lack of technology aimed at the Greek market. If it is indeed English, then we must be leaps and bounds ahead of everyone else in Europe because I see tons of languages in various products, not necessarily like the ones I'm discussing here, but still more than we have. I'm in the advanced level of my course. Obviously, there are still many things that I don't know, so can't claim to be a fluent speaker. But I've been told that my accent is a native one. I also have a very keen ear for such things and can easily pick out not only good synthesizers, but in certain cases, the fact that three different people come from three different regions of the country. Of course, wave files are not analogue. I'm sorry if it appeared that I was saying this. I simply meant that there's also probably a way to do analogue recordings and that I'm more familiar with the process and technical involvement there than with digital things. But in this instance, it probably wouldn't work, and in any case, this doesn't sound too difficult. My only question would be how the voice would know when to speak. that is, if it's something that requires user interaction in some way, how would I be able to have the voice say what it's supposed to say when it's supposed to say it? But that, if anything, would be for the future, not for now.

Post 6 by LeoGuardian (You mean there is something outside of this room with my computer in it?) on Thursday, 01-Apr-2010 0:03:53

The voice would "know when to speak" based on your UI code, i.e. the code that manages what is going to and from the user.

Post 7 by Eleni21 (I have proven to myself and the world that I need mental help) on Thursday, 01-Apr-2010 0:09:43

Gotcha. thanks.